564 research outputs found

    Asynchronous Training of Word Embeddings for Large Text Corpora

    Full text link
    Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires 1/101/10 of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201

    V-I characteristics in the vicinity of order-disorder transition in vortex matter

    Full text link
    The shape of the V-I characteristics leading to a peak in the differential resistance r_d=dV/dI in the vicinity of the order-disorder transition in NbSe2 is investigated. r_d is large when measured by dc current. However, for a small Iac on a dc bias r_d decreases rapidly with frequency, even at a few Hz, and displays a large out-of-phase signal. In contrast, the ac response increases with frequency in the absence of dc bias. These surprisingly opposite phenomena and the peak in r_d are shown to result from a dynamic coexistence of two vortex matter phases rather than from the commonly assumed plastic depinning.Comment: 12 pages 4 figures. Accepted for publication in PRB rapi

    High Prevalence and Genetic Diversity of HCV among HIV-1 Infected People from Various High-Risk Groups in China

    Get PDF
    BACKGROUND: Co-infection with HIV-1 and HCV is a significant global public health problem and a major consideration for anti-HIV-1 treatment. HCV infection among HIV-1 positive people who are eligible for the newly launched nationwide anti-HIV-1 treatment program in China has not been well characterized. METHODOLOGY: A nationwide survey of HIV-1 positive injection drug uses (IDU), former paid blood donors (FBD), and sexually transmitted cases from multiple provinces including the four most affected provinces in China was conducted. HCV prevalence and genetic diversity were determined. We found that IDU and FBD have extremely high rates of HCV infection (97% and 93%, respectively). Surprisingly, people who acquired HIV-1 through sexual contact also had a higher rate of HCV infection (20%) than the general population. HIV-1 subtype and HCV genotypes were amazingly similar among FBD from multiple provinces stretching from Central to Northeast China. However, although patterns of overland trafficking of heroin and distinct HIV-1 subtypes could be detected among IDU, HCV genotypes of IDU were more diverse and exhibited significant regional differences. CONCLUSION: Emerging HIV-1 and HCV co-infection and possible sexual transmission of HCV in China require urgent prevention measures and should be taken into consideration in the nationwide antiretroviral treatment program

    HCV 6a Prevalence in Guangdong Province Had the Origin from Vietnam and Recent Dissemination to Other Regions of China: Phylogeographic Analyses

    Get PDF
    Recently in China, HCV 6a infection has shown a fast increase among patients and blood donors, possibly due to IDU linked transmission.We recruited 210 drug users in Shanwei city, Guangdong province. Among them, HCV RNA was detected in 150 (71.4%), both E1 and NS5B genes were sequenced in 136, and 6a genotyped in 70. Of the 6a sequences, most were grouped into three clusters while 23% represent emerging strains. For coalescent analysis, additional 6a sequences were determined among 21 blood donors from Vietnam, 22 donors from 12 provinces of China, and 36 IDUs from Liuzhou City in Guangxi Province. Phylogeographic analyses indicated that Vietnam could be the origin of 6a in China. The Guangxi Province, which borders Vietnam, could be the first region to accept 6a for circulation. Migration from Yunnan, which also borders Vietnam, might be equally important, but it was only detected among IDUs in limited regions. From Guangxi, 6a could have further spread to Guangdong, Yunnan, Hainan, and Hubei provinces. However, evidence showed that only in Guangdong has 6a become a local epidemic, making Guangdong the second source region to disseminate 6a to the other 12 provinces. With a rate of 2.737×10⁻³ (95% CI: 1.792×10⁻³ to 3.745×10⁻³), a Bayesian Skyline Plot was portrayed. It revealed an exponential 6a growth during 1994-1998, while before and after 1994-1998 slow 6a growths were maintained. Concurrently, 1994-1998 corresponded to a period when contaminated blood transfusion was common, which caused many people being infected with HIV and HCV, until the Chinese government outlawed the use of paid blood donations in 1998.With an origin from Vietnam, 6a has become a local epidemic in Guangdong Province, where an increasing prevalence has subsequently led to 6a spread to many other regions of China

    Electron correlation effects in electron-hole recombination in organic light-emitting diodes

    Get PDF
    We develop a general theory of electron--hole recombination in organic light emitting diodes that leads to formation of emissive singlet excitons and nonemissive triplet excitons. We briefly review other existing theories and show how our approach is substantively different from these theories. Using an exact time-dependent approach to the interchain/intermolecular charge-transfer within a long-range interacting model we find that, (i) the relative yield of the singlet exciton in polymers is considerably larger than the 25% predicted from statistical considerations, (ii) the singlet exciton yield increases with chain length in oligomers, and, (iii) in small molecules containing nitrogen heteroatoms, the relative yield of the singlet exciton is considerably smaller and may be even close to 25%. The above results are independent of whether or not the bond-charge repulsion, X_perp, is included in the interchain part of the Hamiltonian for the two-chain system. The larger (smaller) yield of the singlet (triplet) exciton in carbon-based long-chain polymers is a consequence of both its ionic (covalent) nature and smaller (larger) binding energy. In nitrogen containing monomers, wavefunctions are closer to the noninteracting limit, and this decreases (increases) the relative yield of the singlet (triplet) exciton. Our results are in qualitative agreement with electroluminescence experiments involving both molecular and polymeric light emitters. The time-dependent approach developed here for describing intermolecular charge-transfer processes is completely general and may be applied to many other such processes.Comment: 19 pages, 11 figure

    Naturally occurring mutations in the PA gene are key contributors to increased virulence of pandemic H1N1/09 influenza virus in mice

    Get PDF
    We examined the molecular basis of virulence of pandemic H1N1/09 influenza viruses by reverse genetics based on two H1N1/09 virus isolates (A/California/04/2009 [CA04] and A/swine/Shandong/731/2009 [SD731]) with contrasting pathogenicities in mice. We found that four amino acid mutations (P224S in the PA protein [PA-P224S], PB2-T588I, NA-V106I, and NS1-I123V) contributed to the lethal phenotype of SD731. In particular, the PA-P224S mutation when combined with PA-A70V in CA04 drastically reduced the virus's 50% mouse lethal dose (LD50), by almost 1,000-fold

    Validating module network learning algorithms using simulated data

    Get PDF
    In recent years, several authors have used probabilistic graphical models to learn expression modules and their regulatory programs from gene expression data. Here, we demonstrate the use of the synthetic data generator SynTReN for the purpose of testing and comparing module network learning algorithms. We introduce a software package for learning module networks, called LeMoNe, which incorporates a novel strategy for learning regulatory programs. Novelties include the use of a bottom-up Bayesian hierarchical clustering to construct the regulatory programs, and the use of a conditional entropy measure to assign regulators to the regulation program nodes. Using SynTReN data, we test the performance of LeMoNe in a completely controlled situation and assess the effect of the methodological changes we made with respect to an existing software package, namely Genomica. Additionally, we assess the effect of various parameters, such as the size of the data set and the amount of noise, on the inference performance. Overall, application of Genomica and LeMoNe to simulated data sets gave comparable results. However, LeMoNe offers some advantages, one of them being that the learning process is considerably faster for larger data sets. Additionally, we show that the location of the regulators in the LeMoNe regulation programs and their conditional entropy may be used to prioritize regulators for functional validation, and that the combination of the bottom-up clustering strategy with the conditional entropy-based assignment of regulators improves the handling of missing or hidden regulators.Comment: 13 pages, 6 figures + 2 pages, 2 figures supplementary informatio

    Cross-protection against European swine influenza viruses in the context of infection immunity against the 2009 pandemic H1N1 virus : studies in the pig model of influenza

    Get PDF
    Pigs are natural hosts for the same influenza virus subtypes as humans and are a valuable model for cross-protection studies with influenza. In this study, we have used the pig model to examine the extent of virological protection between a) the 2009 pandemic H1N1 (pH1N1) virus and three different European H1 swine influenza virus (SIV) lineages, and b) these H1 viruses and a European H3N2 SIV. Pigs were inoculated intranasally with representative strains of each virus lineage with 6- and 17-week intervals between H1 inoculations and between H1 and H3 inoculations, respectively. Virus titers in nasal swabs and/or tissues of the respiratory tract were determined after each inoculation. There was substantial though differing cross-protection between pH1N1 and other H1 viruses, which was directly correlated with the relatedness in the viral hemagglutinin (HA) and neuraminidase (NA) proteins. Cross-protection against H3N2 was almost complete in pigs with immunity against H1N2, but was weak in H1N1/pH1N1-immune pigs. In conclusion, infection with a live, wild type influenza virus may offer substantial cross-lineage protection against viruses of the same HA and/or NA subtype. True heterosubtypic protection, in contrast, appears to be minimal in natural influenza virus hosts. We discuss our findings in the light of the zoonotic and pandemic risks of SIVs

    Sequence-based prediction for vaccine strain selection and identification of antigenic variability in foot-and-mouth disease virus

    Get PDF
    Identifying when past exposure to an infectious disease will protect against newly emerging strains is central to understanding the spread and the severity of epidemics, but the prediction of viral cross-protection remains an important unsolved problem. For foot-and-mouth disease virus (FMDV) research in particular, improved methods for predicting this cross-protection are critical for predicting the severity of outbreaks within endemic settings where multiple serotypes and subtypes commonly co-circulate, as well as for deciding whether appropriate vaccine(s) exist and how much they could mitigate the effects of any outbreak. To identify antigenic relationships and their predictors, we used linear mixed effects models to account for variation in pairwise cross-neutralization titres using only viral sequences and structural data. We identified those substitutions in surface-exposed structural proteins that are correlates of loss of cross-reactivity. These allowed prediction of both the best vaccine match for any single virus and the breadth of coverage of new vaccine candidates from their capsid sequences as effectively as or better than serology. Sub-sequences chosen by the model-building process all contained sites that are known epitopes on other serotypes. Furthermore, for the SAT1 serotype, for which epitopes have never previously been identified, we provide strong evidence - by controlling for phylogenetic structure - for the presence of three epitopes across a panel of viruses and quantify the relative significance of some individual residues in determining cross-neutralization. Identifying and quantifying the importance of sites that predict viral strain cross-reactivity not just for single viruses but across entire serotypes can help in the design of vaccines with better targeting and broader coverage. These techniques can be generalized to any infectious agents where cross-reactivity assays have been carried out. As the parameterization uses pre-existing datasets, this approach quickly and cheaply increases both our understanding of antigenic relationships and our power to control disease
    corecore